UEval
tb run -d UEval==1.0 -a "<agent>" -m "<model>"
Loading tasks...
Website template modified from https://www.tbench.ai/.